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Abstract: A prototype of digital frequency multiplexing electronics allowing the real time moni- 
toring of microwave kinetic inductance detector (MKIDs) arrays for mm-wave astronomy has been 
developed. Thanks to the frequency multiplexing, it can monitor simultaneously 400 pixels over a 
500 MHz bandwidth and requires only two coaxial cables for instrumenting such a lai^ge aiTay. The 
chosen solution and the performances achieved are presented in this paper. 
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1. 


Introduction 





Microwave kinetic inductance detectors (MKIDs) have proven to be a solid working alternative to 
traditional bolometers for millimeter and sub-millimeter astronomy QE B S Hi- MKIDs are com- 
posed of high-quality superconducting resonant circuits electromagnetically coupled to a transmis- 
sion line. They are designed to resonate in the microwave domain [S S [7|]. For astronomical 
applications, the resonances typically lie between 1 to 10 GHz and have loaded quality factors 
around Ql = 10^, corresponding to a typical bandwidth of Af = f/QL ~ 10— 100 kHz. Provided 
that the MKID resonant frequencies can be easily adjusted by layout design, it is possible to couple 
a large number of MKIDs with different resonance frequencies to a single transmission line [Hi- 
Indeed, a large number of MKIDs can naturally be read out by a frequency-based multiplexing 
system with no loss of performance [0]. In practice, the average frequency spacing between res- 
onators is between 1 and 2 MHz [HU. Thus, in order to ensure the largest sky coverage and overall 
signal to noise per unit of time with a reduce number of cables (few) feedthrough to the cryostat. 
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the analog bandwidth and the number of detectors (resonators) managed by the electronics must 
be maximized. At this respect, we present here a building block for the NIKA camera OH 0] that is 
able to monitor simultaneously 400 pixels over a 500 MHz bandwidth. 

2. Instrumentation methodology 

The instrumentation setup used for NIKA and its associated electronics is extensively described in 
0]. In summary, the excitation frequency comb is generated at baseband in the electronics using 
coordinate rotation digital computer (CORDIC), up-converted with an IQ mixer to the 1 to 10 GHz 
frequency and injected in the resonator line. The returning and thus modified frequency comb is 
down-converted and analyzed by channelized Digital Down Converters (DDC) to determine each 
tone amplitude and phase. Aside from good signal to noise ratio (SNR) on the whole chain, the 
first limitation on the number of MKIDs managed by this solution is given by the digital to analog 
converter (DAC) and the analog to digital converter (ADC) bandwidths. The second constraint 
comes from the computing power limitation. For a FPGA (Field Programmable Gate Array), the 
computational power is determined by the available amount user logic and multiplier block times 
their maximum running frequency. Indeed, thanks to the inherently achievable parallelization in 
FPGAs, this figure is much larger compared to DSPs that have only a few Multiplier Accumulators. 

Starting from the previous version, which was able to manage a line of 128 tones over a 
bandwidth of 125 MHz, three solutions are possible to increase the multiplexing factor per line. The 
first solution would be to juxtapose several of the previous electronic boards, each one managing 
its share of bandwidth, see figure |I]. 

Unfortunately, the analog filters required to separate each share of bandwidth before down- 
converting have such a stringent separation requirement to avoid crosstalk due to image frequencies 
that they cannot be constructed. 

The second option is to use faster ADCs an DACs combined to a larger computing power 
(FPGA) in order to directly cover a larger bandwidth. Following this path, two concurrent ap- 
proaches still remain. The first "obvious" solution is to directly generate the frequency comb at 
twice the desired bandwidth and to perform channelized DDC with the ADC signal. Unfortunately, 
due to the frequency limitation of state of the art FPGAs this can only be achieved by performing 
massive design pipelining on both sides, excitation and analysis, and therefore makes it extremely 
complicated. 

The third option, which we have chosen, is to use modern DACs featuring digital modulator 
and interpolator followed by very steep half -band filters for generating the excitation comb. With 
these, the total frequency bandwidth to cover is split into smaller bands where the frequency combs 
can be computed at a moderate frequency, digitally up-converted and filtered to avoid unwanted 
spurious frequencies. Finally, each band contribution is then summed before being up-converted to 
the frequency band of interest by an IQ up-mixer. At reception side, the returning signal is down 
converted to baseband and is digitized by a fast ADC. Then, the digitized signal goes through 
a polyphase filter bank with equal bandwidth overlapping bands. This filter, has the ability to 
separate the total bandwidth in five smaller frequency bands and to down convert each of them to 
baseband. The sub frequency bands are chosen such as to match the excitation bands. The filter 
outputs are fed to the corresponding channelized DDC in order to be analyzed. The benefit of this 
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Figure 1. Overview of the setup using the juxtaposition of several electronics to monitor a MKID array. 
Each electronics generating the two frequency combs (each tone phase shifted by 90° between I and Q) 
is followed by an IQ up-mixer The excitation combs up-converted at high frequencies are summed and 
the resulting signal is fed to a programmable attenuator for power adjustment. After passing through the 
cryostat and the low noise boost amplifier each share of bandwidth is separated by highly selective filters 
before passing through the down-mixers and returned to the corresponding electronics. 



architecture is to limit the massive pipelining to the polyphase filter part, and thus, to dramatically 
reduce the required amount of user logic for the frequency comb generation and the chamielized 
DDC. 

3. Hardware development 

Following section 0, a dedicated hardware, the New Iram KID ELectronics (NIKEL), able to man- 
age 400 resonators over a bandwidth of 500 MHz was developed. NIKEL is designed such as to 
manage five adjacent bands of 100 MHz. This choice was driven by the chosen DAC capabilities 
(AD9125 from analog devices). As shown in figure 0, the NIKEL electronic board is composed 
of a central FPGA (labeled 'split') which receives the 12 bit ADC (ADS5400 from Texas Instru- 
ments) output data flow at 1 GSPS and of five processing FPGAs (labeled 'proc'). Each of the latter 
is driving its associated DAC with the adequate frequency comb which can feature up to 80 tones. 
The 'proc' FPGA is connected to the 'split' FPGA with two links. The first of these, labeled 'fake 
ADC, is a 12 bit parallel LVDS hnk running at 250MSPS that is carrying the part of bandwidth 
corresponding to the excitation signal. The second link, labeled 'GTX link', is periodically (at 
~953 Hz) conveying the 80 DDC results over a 2 Gb/s serial link. The six FPGAs are from the 
same founder (Xilinx XC6VLX75T-2FFG484C). They provide a satisfactory amount of available 
user logic, coupled to a sufficiently large Multiplier Accumulator block (MAC) count. They also 
feature eight high speed serial links. 

An additional slow speed DAC, driven by the 'split' FPGA, is implemented in order to be 
able to provide a ~500Hz modulation signal. Provided that the board can be clocked with a 
reference clock, a bidirectional port was provided to allow synchronization between several boards 
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Figure 2. The electronic board is composed of a central FPGA (labeled 'split') which receives the 12 bit 
ADC (ADS5400 from Texas Instruments) output data flow at 1 GSPS and of five processing FPGA (labeled 
'proc'). Each of the latter is driving its associated DAC with the adequate frequency comb which can feature 
up to 80 tones. The 'proc' FPGA is connected to the 'split' FPGA with two links. The first of these, labeled 
'fake ADC, is a 12 bit pai-allel LVDS link running at 250 MSPS that is caiTying the part of bandwidth 
corresponding to the excitation signal. The second link, labeled 'GTX link', is periodically (at ^953 Hz) 
conveying the 80 DDC results over a 2 Gb/s serial link. An additional slow speed DAC, driven by the 
'split' FPGA, is implemented to be able to provide a 500 Hz modulation signal. The communication with 
the hardware is ensured via a USB2 capable micro-controller and an interface FPGA that accommodates 
different voltage levels. 



performing acquisition on the same kilo-pixel camera. When using several NIKEL electronics, 
one board must be configured as master and provide the synchronization signal, while the others 
are configured as slaves and should start their acquisition upon reception of this synchronization 
signal. 

The communication with the hardware is ensured via a USB2 capable micro-controller and 
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an interface FPGA that accommodates the different voltage levels. It allows the dynamic FPGA 
reconfiguration, the tone frequencies adjustments and the data readout. 

A picture of the board can be seen in figure 0. It is a 14 layers PCB having a dimension of 
184 mm x 153 mm. The inner dielectric layers are made of traditional FR4 epoxy while the outside 
layers consist of RO4350 high frequency circuit [ITOl that have lower dielectric losses and therefore 
are well suited to accommodate the 2 Gb/s serial links, the DAC outputs that provides samples at 
1 GSPS and the ADC input signal. 



DCDC converters 




Modulation 
output"*^ 



Figure 3. Picture of the NIKEL board. It is a 14 layers PCB having a dimension of 184 mm x 153 mm. 



Due to to the extensive FPGA resource usage and their running frequency (250 MHz) special 
care was taken in designing the electronic board power supply. Indeed, each FPGA core supply 
draws a current of about 5 A when all tones are activated. Thanks to the usage of DC/DC converters 
the total current drawn on the input power supply is below 20 A, thus a maximum required total 
power of 100 W (or 0.25 W per channel). 
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4. Polyphase filter design and full chain simulation 

As introduced in sections and 0, the received signal, sampled at 1 GS/s, must be decomposed in 
five 250MS/S data streams, each stream having a useful bandwidth of 100 MHz in order to cover 
the 0-500 MHz full bandwidth. Frequency modulation/demodulation is a well documented digital 
signal processing technique flTTll for data transmission (channelization) or audio/image coding ap- 
plication. Those classical techniques, based on Discrete Fourier Transform (DFT), Modified DFT 
(MDFT) or Cosine Modulated filter banks, suffer from aliasing and data distortion mainly due to 
the critical sampling of created sub-bands. Overlapping polyphase filter banks as described in WTSl . 
offer a computationally efficient solution and have only two drawbacks for our application: a first 
sub-band which is half the bandwidth of the others and the last sub-band (also half bandwidth) is 
not usable. The same technique can however be adapted to the wanted filter bank specificity, with 
an acceptable increase of complexity. 

4.1 Theoretical formulation 

Digital filter banks implementations are often non-intuitive, but are however composed of simple 
successive digital signal processing blocks, re-arranged in different form to increase computing 
efficiency. The simplified processing for each band of the filter bank is described hereafter. At 
first, a frequency shift is performed to translate the band of interest around Hz. Then a low pass 
filtering followed by decimation is applied to select the frequencies of interest. This paragraph 
described the basic blocks arrangement involved in the specific processing used here. 

The input data stream is a real signal, sampled at Fgi = 1 GS / s where four consecutive samples 
are presented at the filter bank input at each system clock cycle (250 MHz) while the filter bank 
outputs five different samples, one for each output band. The signal processing for each band k 
(k=0..4) is done in five consecutive steps. An illustration is provided for band k=2 in figure ^ and 
the operations are described hereafter: 

1. Perform an input signal frequency shift of — (2k+ 1) •Fsi/20 where Fsi/20 =50 MHz. This 
is obtained by multiplying the input signal by the complex exponential e^'^(2'^+')"/^*' where 
n is the input signal sample index. 

2. Filter the complex signal by a low pass Finite Impulse Response (FIR) filter having a pass- 
band of Fsi/20 and a maximum rejection after Fsi/10 + Fsi/80 = Fsi/16. 

3. Decimate the result by a factor of 4. The new data rate then becoming Fgo = Fsi/4 and the 
resulting filtered signal bandwidth [— Fso/4,Fso/4]. 

4. Up convert by Fso/4. In practice, realized by multiplying the previous signal by e^^^"^l^ 
where m = ?i/4 is the sample index of the decimated data stream. The resulting complex 
signal covers the frequency band [0,Fso/2] 

5. Finally, keep only the complex signal real part. This will add the complex conjugate negative 
frequency image in the frequency plane. The output real signal is then correctly sampled at 
Fso without aliasing. 
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For a given tone c (frequency fck), located in the k* band of the input data stream can have its 
frequency expressed as fck = Fgi/lO • k + fc. Due to the whole processing, it should be noted this 
tone will not appear in the k* filter bank output at the frequency fc, but at fc +Fso/20. Consequently, 
the tones used for KID excitation must present a frequency shift of — Fso/20 with respect to the one 
used for performing DDC on the returning signal provided by the filter. 

In practice, the FIR filter do not have to be as steep as noted in step 2 given the fact that any 
aliasing causing frequency folding in the useless sidebands causes no harm. Consequently, a poorer 
rejection up to Fsi/ 16 + Fsi/80 = 3Fsi/40 can be tolerated and greatly eases the FIR filter design. 

Unfortunately, this processing is very inefficient in this direct form for several reasons. For 
instance, the filtering is done for each band and on complex data. Furthermore, resource consuming 
FIR filtering is performed on the frequency shifted data, but it is followed by decimation. In 
other words, samples are computed needlessly. These computing inefficiencies can be considerably 
improved by grouping the different frequency shifts and by using polyphase filters. 

4.2 Polyphase filters 

If x{n) is the input sample signal, the frequency shifted data stream for the band k, k=0..4, is 
expressed by equation PTTj 

x,^{n)=x{n)-e-^J''-sr- (4.1) 
The output, x"i^{n), of the low pass FIR filter with coefficients a{p) is then 

xlin) = £a(p) -x.in-p) = e'^J^'^ l^a{p) .x{n- p)e^J^'^ (4.2) 
p p 

Provided that the filtered signal is down-sampled by a factor of 4, x"i^{n) can be only computed 
for n = Am. By decomposing the filter into a 20 phases polyphase filters, where the coefficients 
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index p is given hy p = q + 20r with q=0.. 19, equation 

19 



■J"- 20 



can be written in the following form 
■w^{m) (4.3) 



where Wq{m) is the output of the q* phase polyphase filter. 

Wq{m) =^a{q + 20r) •x{Am — q — 20r) 



(4.4) 



The final step is to up convert the signal by Fso/4, which is equivalent to a complex multipli- 
cation by j"^ and taking the real part of the resulting complex number 



yk{m) =Re 



2^e-' 10 ■Wq{m) 



(4.5) 



The use of the polyphase decomposition of the FIR filter considerably reduces the computation 
cost. However, it can be seen in equation ^31 that a lot of calculation still need to be done on 
complex numbers before keeping only the real part. This leaves some margins for optimization. 

4.3 Optimized reconstruction 

Since input and outputs of the polyphase filter banks are real signals, it is possible to perform all 
computation only on real numbers. Equation ^3] can be re-written as 



yk{m) =Re 



19 



^e^™[5'"+(2^+i)('?-4-)l.w,(m) 

9=0 



(4.6) 



We can change the order of the polyphase filter outputs Wq (m) in the sum by introducing new 
data streams w)(wi) = Wq{m) with I =q + {m mod 20). Due to the Ijn periodicity of the complex 
exponential function, the output of the filter bank can be expressed by the following formula: 



yk{m) =Re 



19 



.■ T(2t+i); , 

10 •W/(wi) 



/=0 



■1) 



19 

'"'^ £cos 

1=0 



{2k + 1)171 
10 



■wi{m) (4.7) 



This simple rotation of the polyphase filter outputs orders, greatly simplify the formula. More- 
over, each filter bank output can now be computed without complex arithmetics. 

4.4 Full chain simulation 
4.4.1 Excitation DAC 

In order to validate the DAC choice and to select its best configuration for each band, that aie the 
digital modulator frequency and the half-band filters to engage, the DAC behavior was simulated. 
Indeed, the Frequency Tuning Word (FTW) allowing the configuration of the modulator frequency 

is given by the following formula FTW = """^'^ x 2^^, where fn^o is 500 MHz. Ideally, it is desired 

fnco 

to have five frequency bands and thus five different carrier frequencies going from to 400 Mhz 
in steps of 100 MHz. Consequently, the first approach would be to select these exact values that 
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are perfectly suited to fit witli tiie iialf-band filters. Unfortunately, these carrier frequencies would 

yield real FTW instead of integer FTW. Using these rounded values would induce a small offset in 

frequency which would be observed as a 27r/2^^ phase shift every 2^*^ clock cycles. Consequently, 

f 

the carrier frequencies were adequately chosen to obtain a ^qq ^SlIxo of 0, 7/32, 13/32, 19/32 
and 26/32 yielding integer FTW values. 

Given the fact that the manufacturer provided the DAC half -band filter coefficients, a thorough 
simulation of the DAC, having the appropriate filters selected, was conducted for the five excitation 
bands. The results confirm that non optimal carrier frequencies are acceptable. In particular, the 
flatness is slightly degraded while the ripple remains below 0.06 dB. 

This mandatory carrier frequency shift with respect to the ideal value, must be pre-compensated 
in the FPGA 'proc' building the excitation frequency comb by a digital modulator that apply a fre- 
quency shift in the opposite direction to virtually obtain carrier frequencies at the requested values 
(from to 400 MHz). The required compensations, expressed as a ratio of Fso, are respectively: 0, 
-3/80, -2/80, +2/80 and -4/80. 

This shift is accomplished in the meantime as the frequency shift of — Fso/20 needed to com- 
pensate the polyphase filter bank induced shift (see section ^3). Consequently, the final required 
compensations, again expressed in ratio of Fjo are : - 1/20, -7/80, -4/80, -3/80 and -3/40. In practice, 
these are implemented with 80 values sine and cosine table feeding digital modulator. 

4.4.2 Polyphase filter 

Likewise to the excitation DAC, the polyphase filter was simulated in order to assess its perfor- 
mances and to find the best implementation options matching the FPGA available resources. Dur- 
ing the firmware design, the mathematical simulation tool was used to build stimulus files and 
reference filter output that were used by the VHDL simulation tool to speed up the design and 
validate the firmware implementation of the filter. 

The simulation was also an asset in designing the FIR filter used. As shown in figure ^ the 
selected FIR has a good flatness over the useful bandwidth (<0.01 dB). The choice was made to 
concede a larger than specified transition band [50-75 MHz] while having an excellent rejection 
(-170 dB) in the stopband. As explained in section ^TTl, possible resulting aliasing does not impact 
DDC performances in the useful bandwidth. Additionally, the quantization noise due to the use of 
the fixed point Multiplier Accumulator (MAC) was evaluated and confirmed to be negligible with 
respect to the quantization noise of the ADC. 

A full polyphase filter simulation, where three tones (205 MHz, 250 MHz, 299 MHz) are in- 
jected at the filter input, is shown in figure ^. The top left figure shows the input signal spectrum, 
and the other plots show the frequency content of each output of the polyphase filter. It can be 
observed that the expected tones lie in the expected band k=2, while the spurious appearing in band 
k=l and k=3 are in their rejected side bands, i.e above 200 MHz for band k=l and below 300 MHz 
for band k=3. 
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Figure 5. Simulation of the selected FIR filter. Top left figure shows the global filter response. Top right 
figure, shows the gain fluctuation in the passband and bottom figure shows the steep rejection after the 
passband. 
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Figure 6. full polyphase filter simulation, where three tones (205 MHz, 250 MHz, 299 MHz) are injected at 
the filter input. The top left figure shows the input signal spectrum, and the other plots show the frequency 
content of each output of the polyphase filter. It can be observed that the expected tones he in the expected 
band k=2, while the spurious appearing in band k=l and k=3 are in their rejected side bands, i.e above 
200 MHz for band k=l and below 300 MHz for band k=3. 
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5. Firmware development 
5.1 FPGA 'split' description 
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Figure 7. Overview of the 'split' FPGA firmware. The firmware is divided in two main parts. The first part, 
which is the key-point of the overall design, is composed of the ADC interface, the polyphase filter bank 
and the five 'fakeADC outputs, each carrying its share of the bandwidth to the dedicated 'proc' FPGA. The 
second part consists of five GTX receivers that collect the I/Q data provided by each 'proc' FPGA, a data 
concentrator, a large FIFO and a USB interface. 



The 'split' FPGA, shown in figure [7|, contains two main parts. The first part, which is the 
key-point of the overall design, is composed of the ADC interface, the polyphase filter bank and 
the five 'fakeADC outputs, each carrying its share of the bandwidth to the dedicated 'proc' FPGA. 
The second part consists of five GTX receivers that collect the I/Q data provided by each 'proc' 
FPGA, a data concentrator, a large FIFO and a USB interface. 

The GTX2IQ receiver blocks are designed to operate at a speed of 2 Gb/s. This is the speed 
required to carry 32 bit at 50 MHz with an 8b/10b encoding. Every ~1.05 ms (2^^ clock cycles at 
250 MHz) a 644 bytes data frame is received (see section |5?^ and stored in a small reception buffer 
(1 k word deep). Once all GTX2IQ received its data frame, the 'data concentrator' transfers each 
link data into the global data buffer labeled 'USB interface FIFO' (32 k word deep) to make the 
complete data frame available for data acquisition via the USB interface. 

The USB interface is mostly in charge of reading out the 'USB interface FIFO' and thus 
of performing data acquisition. The required data throughput is 644 x 5 x 953 = 3 MB/s. The 
interface is also used to set the master/slave mode, to arm the acquisition, to select the modulation 
mode and to configure and recover the status of the GTX transceiver links. 
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The DAC modulation block is used to generate an optional modulation signal which can be a 
2 or 4 values modulation signal, depending whether it is desired or not to compute the sensitivity 
(first derivative) and the sensitivity variation (second derivative) of the I/Q measurement. When 
this block is activated, the modulation signal is modified every integration cycle. To ensure the 
modulation synchronousness with the integration performed in the DDC, the initial start of the 
modulation is adjustable with a resolution of 4 ns and up to one full integration cycle. 

The polyphase filter bank implementation (shown in figure ||) is composed of five successive 
stages. During the design, several stratagems where used to minimize the number of DSP48 blocks 
used and hence to allow the filter to fit in the chosen FPGA. 

The input stage is composed of a shift register bank featuring 20 registers of 12 bit. It receives 
four new ADC samples every clock cycle and at the same time performs a four data samples shifting 
from the newest data to the oldest. At the output of this stage the n to n — 19 samples ai^e provided 
to the following stage. 

The following stage is composed of 20 FIR filters, each processing one of the 'input stage' 
output. The FIR filters feature 45 taps and are implemented in the transposed direct form which 
suits perfectly the possibilities offered by the DSP48 blocks inside the Virtex 6 FPGA. Given the 
fact that for each FIR filter only 9 taps out of 45 are non zero, the zero coefficient taps are replaced 
by simple registers. This artifice alone allows to use only 180 DSP48 blocks for the whole filter 
bank instead of 450. 

The third stage, named 'rotation block' , is used to rotate the vector composed of the 20 FIR 
filter outputs and to provide it to the 'optimized reconstruction block'. The rotation consist in 
routing the data according to the following equation: w/(m) = W(/_„j) ,„o(/ 2o("^) where 1=0.. 19 and 
m the sample index. In practice, this is implemented with 20 high performance multiplexers having 
20 inputs and one output. Each of these multiplexers is controlled by a counter having a to 19 
range and is initialized with a value according to the 'optimized reconstruction block' input it is 
connected to. 
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Figure 8. Polyphase filter bank implementation overview. 



According to equation given in section [4.3| , having the vectors Y(m), W (m) and W (m 
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being respectively composed of yk{m), wl{m) and W/(m) for k=0..4 and 1=0.. 19, the optimized 
reconstruction can be computed by Y(m) = J™ • W " (m) = J™ • A • w' (m), where: 



J = 



1 
0-10 
10 
-1 
1 



(5.1) 



and 



A = 



\ a b c d —d —c —b —a —1 —a —b —c —d d c b a 
1 c —d —a —b Ob a d —c —\ —c d a b —b —a —d c 
10-10 10-10 1 0-10 1 0-101 0-10 
1 —c —d a —b b —a d c —\ c d —a b —b a —d —c 
I —a b —c d —d c —b a —I a —b c —d d —c b —a 



(5.2) 



with: 




(5.3) 



It can be seen that computing W (2) does not need any multiplier since the sign inversion 
can be simply obtained by computing the two complement of the input value. Moreover, by using 
2x16 DSP48 slices for computing the non zero and non one values multiplication of the two first 
row of the A matrix, the last two rows can be obtained by sign inversion only. The sign inversion is 
applied on one out of two multiplications only (when 1 is odd). Figure provides a visual summary 
of the block implementation scheme. The whole processing must be pipelined and as opposed to a 
FIR filter, each single sample is multiplied by the 20 coefficients of each row and then the operation 
results are all summed together. This requires the use of two pipelined adders types: one having 
ten inputs for w"(2) (with four pipeline stages) and another having 18 inputs for the others (with 
five pipeline levels). Finally, for a [5,20] x [20, 1] matrix multiplication, only 16 DSP48 slices are 
used. 

The last stage is actually associated with the previous stage ('optimized reconstruction output 
stage'), but for the sake clarity it is shown as separate block. It corresponds to the first term of 
equation PTTj which performs an alternate sign inversion for the odd bands resulting in a frequency 
shift by half the sampling frequency and in a frequency scale reversion. 

The whole design uses 216 out of 288 DSP48 blocks, 18442 out of 93120 slice registers and 
14879 out of 46560 slice LUT. 

5.2 FPGA 'proc' description 

As explained before, the processing FPGA, whose block diagram is shown in figure [T^, is in chai^ge 
of generating the frequency comb in its share of bandwidth and to perform the channelized DDC 
for each considered tone. 

The communication between the USB interface and the FPGA is ensured via a serial link run- 
ning at 50 MHz. The various commands received are interpreted by the 'proc_cmd' state machine. 
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Figure 9. Optimized operator taking benefit of the sign symmetry in A matrix between row k=0 and k=4 
and row k=l and k=3. For row k=2 no multiplier is needed and since there is no phase requirement between 
the different frequency bands, the delay adjustment needed to compensate the DSP48 latency are in fact 
unnecessary. 
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Figure 10. Overview of the 'split' FPGA firmware. 



Commands are of two kinds: the write commands and the read commands. The write commands 
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are used, for instance, to set the individual phase increment values and tone attenuation, the digital 
gain and the mixing table to use for performing a frequency shift. Configuration and test modes 
can also be set via this interface. Among the provided test modes, it may be noted that it is pos- 
sible to record a 'fakeADC signal snapshot of 32 k samples in the 'fakeADC_mem' . The read 
command are used to request data from the FPGA like the GTX link status, the 'fakeADC link 
synchronization status. Moreover the DAC internal registers values can be accessed. 

Given the fact that the 'fakeADC' data emitted by the 'split' FPGA are synchronized by 
the system wide reference clock, a dedicated interface (fakeADC_input) is used to adjust the 
'fakeADC bus delay in order to compensate the data sampling phase misalignment and thus to 
guarantee stable information sampling. The locally synchronized data are provided to the tone 
managers. 

The 80 tone manager outputs are fed to two pipelined adder in order to construct the in-phase 
and quadrature versions of the frequency comb. Each comb version is then frequency shifted by 
an IQ mixer in order to compensate the residual up converting due to the polyphase filtering and 
the frequency shift due to the non optimal selection of the DAC internal modulator frequency (see 
section The digital gain is used to numerically amplify the resulting signal by dB up to 36 dB 
in steps of 6 dB before driving the DAC. This feature is useful to adapt the signal to the ADC input 
range when less than 80 tones are used. 

The IQ2GTX block is used to transmit the DDC results through the high speed link to the 
'split' FPGA for data concentration. Along with these data, the detected peak amplitude, in ab- 
solute value, is transmitted for monitoring and to avoid DAC clipping. Hence, the data frame is 
composed of 2 x 80 32 bit words representing the in-phase/quadrature information. 
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Figure 11. Overview of a tone manager. The block comprises a CORDIC generator, two digital attenuators 
for individual tone power adjustment and a DDC implemented with DSP48 blocks. 



The tone manager, which is depicted in figure [TT], features a Coordinate Rotation Digital 
Computer (CORDIC) [1T3I1 block and a DDC that is composed of an I&Q demodulator followed 
by a Low Pass Filter (LPF). The LPF, which is primarily used to remove the summed frequencies 
component from the spectrum, also provides unwanted frequencies rejection (e.g. frequencies 
tuned to other pixels, white noise, . . .). Each CORDIC, implemented in a pipelined fashion and 
composed only of adders and subtracters, was designed to provide a 10 bit precision on the sine 
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and cosine values calculated. It uses 10 precalculated arc tangent values with 20 bit resolution. 
The phase accumulator that feeds the CORDIC is used to adjust the frequency with a precision 
of 250 MHz/2'*^ ~ 953 Hz. In order to avoid in phase startup at the maximum cosine or sine 
amplitude of all CORDIC, the phase accumulator is initialized at a quarter of its full scale, i.e each 
phase accumulator is reseted at 7r/4. 

The I&Q demodulation is performed by multiplying a copy of the ADC output by replicas of 
the generated sine and cosine values. For practical reasons (FPGA logic resources), the Low pass 
Filter (LPF) is obtained by averaging 2^^ data samples and it is thus in the order of the kHz of 
bandwidth. It must be noted, that the accumulator period must be chosen as a multiple of the phase 
accumulator period in order to avoid beat frequency phenomena. At the end of the accumulation 
cycle, each tone manager transfers its I&Q data to the IQ2GTX interface for transmission to the 
'split' FPGA. 

To allow individual tone power adjustment, the sine and cosine wave are passed through digital 
attenuators before being provided to the block output. Tones can be tuned in the range to 8/8 and 
have a resolution of 1/8"^ of the input power. 

The whole design uses 164 out of 288 DSP48 blocks, 60412 out of 93120 slice registers and 
43508 out of 46560 slice LUT. 

6. Prototype performance 
6.1 System frequency response 

The frequency response of the system was measured for In phase and Quadrature output of the 
board in loop-back mode, i.e one of the board output connected directly to the board ADC input. 
For each measurement, 400 tones uniformly distributed over the system bandwidth were generated 
and analyzed by the embedded DDC. The amplitude of each tone is plotted in figure |T^. 

The expected juxtaposition of the five frequency bands of 100 MHz, corresponding to each 
DAC contribution, can be observed on the plot. The maximum amplitude variation observed over 
the full bandwidth is less than 6 dB. 

We explain the amplitude variation by several factors. A part of the dispersion is due to the 
active and passive electronic components that display a certain amount of dispersion. For instance, 
the DAC gain has a worst case dispersion of ±3.6 %, while the DAC full scale cun^ent resistor has a 
dispersion of ±1 %. Then, there are also the dispersion of the resistor in the passive combiner and 
the balun transformer loss dispersion (not documented). Additionally, the balun transformers have 
a frequency dependent loss (-2 dB at 500 MHz) which partly explains the decreasing tendency of 
the curve. It may also be noted, that the original board design was foreseen to use sum amplifiers 
to sum the five DAC signals (I and Q). Unfortunately they were causing distortion and picking 
noise from the power supplies. In consequence, they were replaced by passive combiners. This 
modification required the implementation of wire straps to bypass the amplifiers that certainly 
induces attenuation as the frequency increase. 

Besides, the DAC output of the bands 100-300 MHz and 400-500 MHz were not routed on the 
outer PCB layers (as striplines), but in the inner FR4 layers (as microstrips) and thus they have 
higher dielectric losses. From the dielectric manufacturer specification, a loss difference of 0.2 dB 
can be observed between FR4 and RO4350 microstrips. 
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Figure 12. Plot of the system frequency response measured for In phase and Quadrature output of the 
board in loop-back mode, i.e one of the board output connected directly to the board ADC input. For each 
measurement, 400 tones uniformly distributed over the system bandwidth were generated and analyzed by 
the embedded DDC. 



Finally, some routing choices were not optimal (bends, stubs, ...) and certainly, they cause 
small impedance variations over the lines which induce transmission losses as well. 

Even though the fluctuation is not fully explained, it remains totally acceptable for such a 
bandwidth. Moreover, this can be corrected by applying tone per tone power adjustment. 

6.2 System noise 

As shown in [@], the main system noise contributors in the KID readout electronic chain are the 
RF mixing electronics and the cold amplifier. Consequently, this prototype was also tested in loop- 
back to measure its noise power spectrum distribution. The measurements were performed for 
one tone generated in the middle of each frequency band and at different output power level. The 
output level was digitally adjusted with the digital gain module available in each FPGA 'proc' (see 
figure |Ty). The highest signal level reached by this method was just slightly above midscale for the 
2^ gain. 

For each tone and in each digital gain conditions, 6000 points were recorded at 23.84 Hz and 
were windowed with a Hann function. The resulting data were used for computing the Fast Fourier 
Transform (FFT) and the 6 dB loss due to the windowing function was compensated. Finally the 
resulting FFT was smoothed by FFT filtering (20 bins kept). 

Figure |T3| shows the system noise Power Spectrum Distribution (PSD) for one tone in each 
frequency band. With the exception of tone 4, all tones have a similar Signal to Noise Ratio (SNR). 
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This is compatible with the board losses mentioned previously that reduce the signal amplitude by 
about 6dB. 
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Figure 13. Power spectrum distribution plot showing system noise for one tone in each band. At the 
exception of tone 4, all tone have a similar Signal to Noise Ratio (SNR). 

Figure shows the system noise PSD for a given tone but for different excitation signal 
amplitudes. The noise floor (relative to earner) is seen to increase accordingly with each amplitude 
decrease. It may be noticed that when all tones are activated in a single band, it is possible to keep 
a digital gain between 2^ and 2^ without DAC clipping because of the frequency values random 
distribution which minimizes the risk to sum all tones at their maximum amplitude at the same time. 
Therefore, the 2' and 2^ gain curves, provide the achievable performance when the full capabilities 
of the board are used. 

7. Conclusion 

We have presented in this paper a first prototype of the NIKEL electronic board which was specif- 
ically designed for the NIKA camera to be installed at the IRAM 30 m telescope at Pico Veleta, 
Spain. We have proved that NIKEL is able to perform real-time frequency multiplexing of an ar- 
ray of up to 400 MKIDs over a bandwidth of 500 MHz with outstanding performances in terms of 
noise. This is due to an innovative solution based on the splitting of the original 500 MHz band 
into five bands of 100 MHz each, thanks to state of art electronic components and sophisticated 
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Figure 14. Power spectrum distribution for a given tone but for different signal amplitude. The noise floor 
(relative to carrier) can be seen to be increased accordingly to each amplitude decrease. 

numerical filtering algorithms. The NIKEL multiplexing factor is three times larger compared to 
previous single board systems and it opens a clear path towards the exploitation and monitoring 
of future kilo-pixel arrays of MKIDs. Consequently, the resulting minimization of the cable count 
towards the cryogenic system makes it an asset. Such large arrays will be with no doubt a serious 
alternative to standard bolometric techniques for millimeter astronomy both because of the intrinsic 
quality of MKIDs (low noise and fast response) and because of the large multiplexing capabilities. 
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